Off-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model
نویسندگان
چکیده مقاله:
In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to identify a sequence of handwritten words. Generally, based on the type of text, document recognition is divided into two main categories: printed and handwritten. Due to the limited number of fonts relative to the diversity of handwriting of different writers, it is much easier to recognize printed texts than handwritten text; thus, the technology of recognizing printed texts has matured and has been marketed in the form of a product. Handwritting recognition task is usually done in two ways: online and offline; offline handwriting recognition involves the automated translation of text in image format to letters that can be used in computer and text-processing applications. Most of the research in the field of handwriting recognition has been conducted on Latin script, and a variety of tools and resources have been gathered for this script. This article focuses on the application of the latest methods in the field of speech recognition for the recognition of Arabic handwriting. The task of handwritten text modeling and recognizing is very similar to the task of speech modeling and recognition. For this reason, it is possible to apply the approaches used for the speech recognition with a slight change for the handwriting recognition. With the expansion of HMM-DNN hybrid approaches and the use of sequential objective functions such as MMI, significant improvements have been made in the accuracy of speech recognition system. This paper presents a pipeline for the offline Arabic handwritten text recognition using the open source KALDI toolkit, which is very well-known in the community of speech recognition, as well as the use of the latest hybrid models presented in it and data augmentation techniques. This research has been conducted on the Arabic KHATT database, which achieved 7.32% absolute reduction in word recognition error (WER) rate.
منابع مشابه
Off-line Handwritten Arabic Character Recognition: A Survey
The automatic recognition of text on scanned images has several applications such as automatic postal mail sorting and searching in large volume of documents. Although Arabic handwritten text recognition has been addressed by many researchers, it remains a challenging task due to several factors. This paper presents an overview of off-line handwritten Arabic character recognition and summarizes...
متن کاملOff-line System for the Recognition of Handwritten Arabic Character
Recognition of handwritten Arabic text awaits accurate recognition solutions. There are many difficulties facing a good handwritten Arabic recognition system such as unlimited variation in human handwriting, similarities of distinct character shapes, and their position in the word. The typical Optical Character Recognition (OCR) systems are based mainly on three stages, preprocessing, features ...
متن کاملRecognition of Off-Line Handwritten Arabic Words Using Hidden Markov Model Approach
Hidden Markov Models (HMM) have been used with some success in recognizing printed Arabic words. In this paper, a complete scheme for totally unconstrained Arabic handwritten word recognition based on a Model discriminant HMM is presented. A complete system able to classify Arabic-Handwritten words of one hundred different writers is proposed and discussed. The system first attempts to remove s...
متن کاملHMM-based handwritten symbol recognition using on-line and off-line features
This paper addresses the problem of recognizing on-line sampled handwritten symbols. Within the proposed symbol recognition system based on Hidden Markov Models different kinds of feature extraction algorithms are used analysing on-line features as well as off-line features and combining the classification results. By conducting writer-dependent recognition experiments, it is demonstrated that ...
متن کاملA hybrid recognition system for off-line handwritten characters
Computer based pattern recognition is a process that involves several sub-processes, including pre-processing, feature extraction, feature selection, and classification. Feature extraction is the estimation of certain attributes of the target patterns. Selection of the right set of features is the most crucial and complex part of building a pattern recognition system. In this work we have combi...
متن کاملA Novel Comprehensive Database for Arabic Off-Line Handwriting Recognition
This paper presents the work toward developing a new comprehensive database for Arabic off-line handwriting recognition. The database includes: isolated Indian digits, numerical strings, Arabic isolated letters, and a collection of 70 Arabic words. Also, the database includes a free format sample of an Arabic date. A data entry form was designed to collect written samples from Arabic native spe...
متن کاملمنابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ذخیره در منابع من قبلا به منابع من ذحیره شده{@ msg_add @}
عنوان ژورنال
دوره 17 شماره 4
صفحات 155- 168
تاریخ انتشار 2021-02
با دنبال کردن یک ژورنال هنگامی که شماره جدید این ژورنال منتشر می شود به شما از طریق ایمیل اطلاع داده می شود.
کلمات کلیدی برای این مقاله ارائه نشده است
میزبانی شده توسط پلتفرم ابری doprax.com
copyright © 2015-2023